A Clustering under Approximation Stability
نویسندگان
چکیده
A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the k-median, k-means, or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those above, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P 6= NP, to achieve clusterings of low-error via polynomial time algorithms. In this paper, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant c, not only the optimal solution, but also all c-approximations to the optimal solution, differ from the target on at most some fraction of points—we call this (c, )-approximation-stability. We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values c for which the objective is known to be NP-hard to approximate. Specifically, for any constant c > 1, (c, )-approximation-stability of k-median or k-means objectives can be used to efficiently produce a clustering of error O( ) with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value.
منابع مشابه
Thesis Proposal: Approximation Algorithms and New Models for Clustering and Learning
This thesis concerns two fundamental problems in clustering and learning: (a) the k-median and the k-means clustering problems, and (b) the problem of learning under adversarial noise, also known as agnostic learning. For k-median and k-means clustering we design efficient algorithms which provide arbitrarily good approximation guarantees on a wide class of datasets. These are datasets which sa...
متن کاملFunction Approximation Approach for Robust Adaptive Control of Flexible joint Robots
This paper is concerned with the problem of designing a robust adaptive controller for flexible joint robots (FJR). Under the assumption of weak joint elasticity, FJR is firstly modeled and converted into singular perturbation form. The control law consists of a FAT-based adaptive control strategy and a simple correction term. The first term of the controller is used to stability of the slow dy...
متن کاملClustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis
Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided the data satisfy a natural stability notion. For example, Bilu and Linial (2010) and Awasthi et al. (2012) presented algorithms that output near-optimal solu...
متن کاملApplication of Pattern Recognition Algorithms for Clustering Power System to Voltage Control Areas and Comparison of Their Results
Finding the collapse susceptible portion of a power system is one of the purposes of voltage stability analysis. This part which is a voltage control area is called the voltage weak area. Determining the weak area and adjecent voltage control areas has special importance in the improvement of voltage stability. Designing an on-line corrective control requires the voltage weak area to be determi...
متن کاملApplication of Pattern Recognition Algorithms for Clustering Power System to Voltage Control Areas and Comparison of Their Results
Finding the collapse susceptible portion of a power system is one of the purposes of voltage stability analysis. This part which is a voltage control area is called the voltage weak area. Determining the weak area and adjecent voltage control areas has special importance in the improvement of voltage stability. Designing an on-line corrective control requires the voltage weak area to be determi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009